Overview

Dataset statistics

Number of variables20
Number of observations39759
Missing cells16212
Missing cells (%)2.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.1 MiB
Average record size in memory160.0 B

Variable types

NUM16
BOOL2
CAT2

Reproduction

Analysis started2020-06-06 12:26:13.678057
Analysis finished2020-06-06 12:27:08.357995
Duration54.68 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

DATE has a high cardinality: 9942 distinct values High cardinality
X_3 is highly correlated with X_2High correlation
X_2 is highly correlated with X_3High correlation
MULTIPLE_OFFENSE has 15903 (40.0%) missing values Missing
X_10 is highly skewed (γ1 = 30.92348051) Skewed
X_12 is highly skewed (γ1 = 26.54109191) Skewed
INCIDENT_ID has unique values Unique
X_1 has 31814 (80.0%) zeros Zeros
X_4 has 5588 (14.1%) zeros Zeros
X_5 has 7908 (19.9%) zeros Zeros
X_7 has 5794 (14.6%) zeros Zeros
X_8 has 14634 (36.8%) zeros Zeros
X_11 has 4268 (10.7%) zeros Zeros
X_12 has 8517 (21.4%) zeros Zeros
X_14 has 458 (1.2%) zeros Zeros
X_15 has 1680 (4.2%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

Distinct count23856
Unique (%)60.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10336.960009054554
Minimum0
Maximum23855
Zeros2
Zeros (%)< 0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile993.9
Q14969.5
median9939
Q314909
95-th percentile21867.1
Maximum23855
Range23855
Interquartile range (IQR)9939.5

Descriptive statistics

Standard deviation6378.244484
Coefficient of variation (CV)0.617032907
Kurtosis-0.893746517
Mean10336.96001
Median Absolute Deviation (MAD)4970
Skewness0.2791311245
Sum410987193
Variance40682002.69
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20472< 0.1%
 
56462< 0.1%
 
97682< 0.1%
 
118172< 0.1%
 
138662< 0.1%
 
15802< 0.1%
 
36292< 0.1%
 
56782< 0.1%
 
77272< 0.1%
 
98002< 0.1%
 
Other values (23846)3973999.9%
 
ValueCountFrequency (%) 
02< 0.1%
 
12< 0.1%
 
22< 0.1%
 
32< 0.1%
 
42< 0.1%
 
ValueCountFrequency (%) 
238551< 0.1%
 
238541< 0.1%
 
238531< 0.1%
 
238521< 0.1%
 
238511< 0.1%
 

INCIDENT_ID
Categorical

UNIQUE

Distinct count39759
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
CR_69942
 
1
CR_188343
 
1
CR_84718
 
1
CR_89395
 
1
CR_24753
 
1
Other values (39754)
39754
ValueCountFrequency (%) 
CR_699421< 0.1%
 
CR_1883431< 0.1%
 
CR_847181< 0.1%
 
CR_893951< 0.1%
 
CR_247531< 0.1%
 
CR_1088271< 0.1%
 
CR_303791< 0.1%
 
CR_806281< 0.1%
 
CR_938341< 0.1%
 
CR_1843551< 0.1%
 
Other values (39749)39749> 99.9%
 

Length

Max length9
Median length8
Mean length8.444201313
Min length4

DATE
Categorical

HIGH CARDINALITY

Distinct count9942
Unique (%)25.0%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
13-SEP-01
 
36
12-SEP-01
 
34
15-SEP-01
 
27
17-SEP-01
 
25
11-SEP-01
 
24
Other values (9937)
39613
ValueCountFrequency (%) 
13-SEP-01360.1%
 
12-SEP-01340.1%
 
15-SEP-01270.1%
 
17-SEP-01250.1%
 
11-SEP-01240.1%
 
14-SEP-01220.1%
 
18-SEP-0119< 0.1%
 
20-SEP-0117< 0.1%
 
01-MAY-9217< 0.1%
 
19-SEP-0116< 0.1%
 
Other values (9932)3952299.4%
 

Length

Max length9
Median length9
Mean length9
Min length9

X_1
Real number (ℝ≥0)

ZEROS

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.47750194924419626
Minimum0
Maximum7
Zeros31814
Zeros (%)80.0%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.428754965
Coefficient of variation (CV)2.992144779
Kurtosis13.88980442
Mean0.4775019492
Median Absolute Deviation (MAD)0
Skewness3.815518275
Sum18985
Variance2.041340749
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03181480.0%
 
1576114.5%
 
714263.6%
 
54581.2%
 
32280.6%
 
4480.1%
 
217< 0.1%
 
67< 0.1%
 
ValueCountFrequency (%) 
03181480.0%
 
1576114.5%
 
217< 0.1%
 
32280.6%
 
4480.1%
 
ValueCountFrequency (%) 
714263.6%
 
67< 0.1%
 
54581.2%
 
4480.1%
 
32280.6%
 

X_2
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count52
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.763776754948566
Minimum0
Maximum52
Zeros40
Zeros (%)0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile4
Q17
median24
Q336
95-th percentile48
Maximum52
Range52
Interquartile range (IQR)29

Descriptive statistics

Standard deviation15.23552157
Coefficient of variation (CV)0.6152341673
Kurtosis-1.307501292
Mean24.76377675
Median Absolute Deviation (MAD)13
Skewness-0.09338549112
Sum984583
Variance232.1211175
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4672416.9%
 
3636579.2%
 
3335739.0%
 
2422575.7%
 
2120885.3%
 
3716064.0%
 
4515453.9%
 
4914863.7%
 
313073.3%
 
2210912.7%
 
Other values (42)1442536.3%
 
ValueCountFrequency (%) 
0400.1%
 
1330.1%
 
21940.5%
 
313073.3%
 
4672416.9%
 
ValueCountFrequency (%) 
52250.1%
 
511620.4%
 
502790.7%
 
4914863.7%
 
48980.2%
 

X_3
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count52
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.61249025377902
Minimum0
Maximum52
Zeros33
Zeros (%)0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile4
Q18
median24
Q335
95-th percentile48
Maximum52
Range52
Interquartile range (IQR)27

Descriptive statistics

Standard deviation15.13187718
Coefficient of variation (CV)0.6148048011
Kurtosis-1.23901782
Mean24.61249025
Median Absolute Deviation (MAD)13
Skewness-0.08088102605
Sum978568
Variance228.9737069
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4672416.9%
 
3436579.2%
 
3235739.0%
 
2422575.7%
 
2320885.3%
 
3716064.0%
 
4515453.9%
 
4914863.7%
 
213073.3%
 
2210912.7%
 
Other values (42)1442536.3%
 
ValueCountFrequency (%) 
0330.1%
 
1400.1%
 
213073.3%
 
31940.5%
 
4672416.9%
 
ValueCountFrequency (%) 
52250.1%
 
512790.7%
 
501620.4%
 
4914863.7%
 
4810662.7%
 

X_4
Real number (ℝ≥0)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.279735405820066
Minimum0
Maximum10
Zeros5588
Zeros (%)14.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q36
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.956637769
Coefficient of variation (CV)0.6908459259
Kurtosis-1.018315231
Mean4.279735406
Median Absolute Deviation (MAD)2
Skewness0.1871304045
Sum170158
Variance8.741706897
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6907822.8%
 
2788319.8%
 
0558814.1%
 
7478112.0%
 
433698.5%
 
331607.9%
 
923205.8%
 
1021135.3%
 
114613.7%
 
56< 0.1%
 
ValueCountFrequency (%) 
0558814.1%
 
114613.7%
 
2788319.8%
 
331607.9%
 
433698.5%
 
ValueCountFrequency (%) 
1021135.3%
 
923205.8%
 
7478112.0%
 
6907822.8%
 
56< 0.1%
 

X_5
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4527528358359114
Minimum0
Maximum5
Zeros7908
Zeros (%)19.9%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.96318419
Coefficient of variation (CV)0.8004003343
Kurtosis-1.556820375
Mean2.452752836
Median Absolute Deviation (MAD)2
Skewness0.1743576897
Sum97519
Variance3.854092163
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
51223830.8%
 
11125228.3%
 
3835521.0%
 
0790819.9%
 
26< 0.1%
 
ValueCountFrequency (%) 
0790819.9%
 
11125228.3%
 
26< 0.1%
 
3835521.0%
 
51223830.8%
 
ValueCountFrequency (%) 
51223830.8%
 
3835521.0%
 
26< 0.1%
 
11125228.3%
 
0790819.9%
 

X_6
Real number (ℝ≥0)

Distinct count19
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.126461933147212
Minimum1
Maximum19
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile15
Maximum19
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.463585046
Coefficient of variation (CV)0.7285746806
Kurtosis0.06079304921
Mean6.126461933
Median Absolute Deviation (MAD)3
Skewness0.9704193921
Sum243582
Variance19.92359146
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1579414.6%
 
5447611.3%
 
6439011.0%
 
438699.7%
 
238639.7%
 
1538229.6%
 
737289.4%
 
329097.3%
 
823565.9%
 
920985.3%
 
Other values (9)24546.2%
 
ValueCountFrequency (%) 
1579414.6%
 
238639.7%
 
329097.3%
 
438699.7%
 
5447611.3%
 
ValueCountFrequency (%) 
195< 0.1%
 
182640.7%
 
171830.5%
 
1610262.6%
 
1538229.6%
 

X_7
Real number (ℝ≥0)

ZEROS

Distinct count19
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.870947458437083
Minimum0
Maximum18
Zeros5794
Zeros (%)14.6%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q37
95-th percentile12
Maximum18
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.870959307
Coefficient of variation (CV)0.7947035644
Kurtosis0.5203116861
Mean4.870947458
Median Absolute Deviation (MAD)3
Skewness0.7988064995
Sum193664
Variance14.98432596
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0579414.6%
 
6447611.3%
 
4439011.0%
 
238699.7%
 
738639.7%
 
1038229.6%
 
137289.4%
 
529097.3%
 
323565.9%
 
820985.3%
 
Other values (9)24546.2%
 
ValueCountFrequency (%) 
0579414.6%
 
137289.4%
 
238699.7%
 
323565.9%
 
4439011.0%
 
ValueCountFrequency (%) 
182400.6%
 
173270.8%
 
163390.9%
 
15390.1%
 
14310.1%
 

X_8
Real number (ℝ≥0)

ZEROS

Distinct count27
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9781684650016349
Minimum0
Maximum99
Zeros14634
Zeros (%)36.8%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile3
Maximum99
Range99
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.46042113
Coefficient of variation (CV)1.49301596
Kurtosis652.7401544
Mean0.978168465
Median Absolute Deviation (MAD)1
Skewness14.4255057
Sum38891
Variance2.132829876
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11832946.1%
 
01463436.8%
 
237729.5%
 
315924.0%
 
46731.7%
 
53500.9%
 
61520.4%
 
7610.2%
 
8540.1%
 
10410.1%
 
Other values (17)1010.3%
 
ValueCountFrequency (%) 
01463436.8%
 
11832946.1%
 
237729.5%
 
315924.0%
 
46731.7%
 
ValueCountFrequency (%) 
991< 0.1%
 
503< 0.1%
 
401< 0.1%
 
302< 0.1%
 
291< 0.1%
 

X_9
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.917980834528032
Minimum0
Maximum6
Zeros200
Zeros (%)0.5%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median5
Q36
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.367461734
Coefficient of variation (CV)0.2780534899
Kurtosis1.252125374
Mean4.917980835
Median Absolute Deviation (MAD)1
Skewness-1.517828575
Sum195534
Variance1.869951595
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
51761044.3%
 
61578139.7%
 
2509112.8%
 
37621.9%
 
13100.8%
 
02000.5%
 
45< 0.1%
 
ValueCountFrequency (%) 
02000.5%
 
13100.8%
 
2509112.8%
 
37621.9%
 
45< 0.1%
 
ValueCountFrequency (%) 
61578139.7%
 
51761044.3%
 
45< 0.1%
 
37621.9%
 
2509112.8%
 

X_10
Real number (ℝ≥0)

SKEWED

Distinct count26
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.243366281848135
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum90
Range89
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.017419435
Coefficient of variation (CV)0.8182781294
Kurtosis2000.81086
Mean1.243366282
Median Absolute Deviation (MAD)0
Skewness30.92348051
Sum49435
Variance1.035142307
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13361884.6%
 
2453211.4%
 
39242.3%
 
43640.9%
 
51140.3%
 
6920.2%
 
8250.1%
 
10250.1%
 
7230.1%
 
911< 0.1%
 
Other values (16)310.1%
 
ValueCountFrequency (%) 
13361884.6%
 
2453211.4%
 
39242.3%
 
43640.9%
 
51140.3%
 
ValueCountFrequency (%) 
901< 0.1%
 
581< 0.1%
 
501< 0.1%
 
402< 0.1%
 
301< 0.1%
 

X_11
Real number (ℝ≥0)

ZEROS

Distinct count150
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206.95434995849996
Minimum0
Maximum332
Zeros4268
Zeros (%)10.7%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1174
median249
Q3249
95-th percentile316
Maximum332
Range332
Interquartile range (IQR)75

Descriptive statistics

Standard deviation93.0619573
Coefficient of variation (CV)0.4496738403
Kurtosis0.192539772
Mean206.95435
Median Absolute Deviation (MAD)67
Skewness-0.9031502716
Sum8228298
Variance8660.527897
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1741210030.4%
 
2491155229.1%
 
316757719.1%
 
0426810.7%
 
3037071.8%
 
1275191.3%
 
1793570.9%
 
743340.8%
 
1022080.5%
 
2631760.4%
 
Other values (140)19614.9%
 
ValueCountFrequency (%) 
0426810.7%
 
13< 0.1%
 
63< 0.1%
 
117< 0.1%
 
121< 0.1%
 
ValueCountFrequency (%) 
3324< 0.1%
 
330390.1%
 
329310.1%
 
3281200.3%
 
3272< 0.1%
 

X_12
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count24
Unique (%)0.1%
Missing309
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean0.9733333333333334
Minimum0.0
Maximum90.0
Zeros8517
Zeros (%)21.4%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile2
Maximum90
Range90
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.060944796
Coefficient of variation (CV)1.090011777
Kurtosis1710.391344
Mean0.9733333333
Median Absolute Deviation (MAD)0
Skewness26.54109191
Sum38398
Variance1.12560386
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12620465.9%
 
0851721.4%
 
234208.6%
 
37972.0%
 
42760.7%
 
51010.3%
 
6590.1%
 
818< 0.1%
 
714< 0.1%
 
1011< 0.1%
 
Other values (14)330.1%
 
(Missing)3090.8%
 
ValueCountFrequency (%) 
0851721.4%
 
12620465.9%
 
234208.6%
 
37972.0%
 
42760.7%
 
ValueCountFrequency (%) 
901< 0.1%
 
581< 0.1%
 
501< 0.1%
 
402< 0.1%
 
301< 0.1%
 

X_13
Real number (ℝ≥0)

Distinct count68
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean85.21886868382002
Minimum0
Maximum117
Zeros2
Zeros (%)< 0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile18
Q172
median98
Q3103
95-th percentile112
Maximum117
Range117
Interquartile range (IQR)31

Descriptive statistics

Standard deviation27.55532481
Coefficient of variation (CV)0.3233476956
Kurtosis1.1341156
Mean85.21886868
Median Absolute Deviation (MAD)11
Skewness-1.398063774
Sum3388217
Variance759.2959255
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1031177529.6%
 
72761219.1%
 
92535313.5%
 
11234688.7%
 
9823075.8%
 
1813993.5%
 
248862.2%
 
1098482.1%
 
127021.8%
 
595601.4%
 
Other values (58)484912.2%
 
ValueCountFrequency (%) 
02< 0.1%
 
18< 0.1%
 
23821.0%
 
72< 0.1%
 
83< 0.1%
 
ValueCountFrequency (%) 
1171< 0.1%
 
1164661.2%
 
115310.1%
 
114200.1%
 
1133670.9%
 

X_14
Real number (ℝ≥0)

ZEROS

Distinct count69
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.49201438667974
Minimum0
Maximum142
Zeros458
Zeros (%)1.2%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile25
Q129
median62
Q3107
95-th percentile142
Maximum142
Range142
Interquartile range (IQR)78

Descriptive statistics

Standard deviation43.35376456
Coefficient of variation (CV)0.5980488323
Kurtosis-1.324487842
Mean72.49201439
Median Absolute Deviation (MAD)33
Skewness0.2532434153
Sum2882210
Variance1879.548901
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
291365934.4%
 
93514012.9%
 
142455711.5%
 
62407010.2%
 
8025296.4%
 
13019765.0%
 
10712343.1%
 
1411582.9%
 
1199432.4%
 
1038422.1%
 
Other values (59)36519.2%
 
ValueCountFrequency (%) 
04581.2%
 
21< 0.1%
 
62130.5%
 
101< 0.1%
 
122< 0.1%
 
ValueCountFrequency (%) 
142455711.5%
 
1401080.3%
 
13913< 0.1%
 
1382270.6%
 
1361010.3%
 

X_15
Real number (ℝ≥0)

ZEROS

Distinct count36
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.44789858899872
Minimum0
Maximum50
Zeros1680
Zeros (%)4.2%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile23
Q134
median34
Q334
95-th percentile46
Maximum50
Range50
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.357811091
Coefficient of variation (CV)0.2498755211
Kurtosis8.811395375
Mean33.44789859
Median Absolute Deviation (MAD)0
Skewness-2.54436585
Sum1329855
Variance69.85300624
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
343164679.6%
 
4325046.3%
 
016804.2%
 
4610792.7%
 
2310632.7%
 
488642.2%
 
363070.8%
 
502170.5%
 
91700.4%
 
39820.2%
 
Other values (26)1470.4%
 
ValueCountFrequency (%) 
016804.2%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
44< 0.1%
 
ValueCountFrequency (%) 
502170.5%
 
488642.2%
 
471< 0.1%
 
4610792.7%
 
4325046.3%
 

MULTIPLE_OFFENSE
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing15903
Missing (%)40.0%
Memory size310.6 KiB
1
22788
0
 
1068
(Missing)
15903
ValueCountFrequency (%) 
12278857.3%
 
010682.7%
 
(Missing)1590340.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
0
23856
1
15903
ValueCountFrequency (%) 
02385660.0%
 
11590340.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexINCIDENT_IDDATEX_1X_2X_3X_4X_5X_6X_7X_8X_9X_10X_11X_12X_13X_14X_15MULTIPLE_OFFENSEis_test_data
00CR_10265904-JUL-040363421561611741.09229360.00
11CR_18975218-JUL-17137370011171612361.0103142341.00
22CR_18463715-MAR-1703235102311741.011093341.00
33CR_13907113-FEB-090333221711612491.07229341.00
44CR_10933513-APR-050333221830511740.011229431.00
55CR_9626307-APR-0304545103101613031.07262341.00
66CR_13140022-JAN-080303573710511740.011229431.00
77CR_1198114-MAY-9308773980513161.07262341.00
88CR_18413421-AUG-160494965831113161.010314341.00
99CR_3263425-AUG-961446515100521451.010329340.00

Last rows

df_indexINCIDENT_IDDATEX_1X_2X_3X_4X_5X_6X_7X_8X_9X_10X_11X_12X_13X_14X_15MULTIPLE_OFFENSEis_test_data
3974915893CR_14837522-OCT-1103235100522492.01038034NaN1
3975015894CR_6773629-JUN-000212341640511741.0989334NaN1
3975115895CR_18589029-JUN-1705535831612491.0722934NaN1
3975215896CR_8986811-MAY-03032351016101.0722934NaN1
3975315897CR_14834301-SEP-1103235103613031.0722934NaN1
3975415898CR_4446828-NOV-97122227315100511740.0722943NaN1
3975515899CR_15846009-JUN-1203530351023202.0729334NaN1
3975615900CR_11594622-APR-0602627906426101.0726234NaN1
3975715901CR_13766303-APR-090212341271622492.0926234NaN1
3975815902CR_3354524-APR-9604465425612491.0722934NaN1